Addressing Environment Non-Stationarity by Repeating Q-learning Updates
نویسندگان
چکیده
Q-learning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to optimal policies in Markov decision processes. However, QL exhibits an artifact: in expectation, the effective rate of updating the value of an action depends on the probability of choosing that action. In other words, there is a tight coupling between the learning dynamics and underlying execution policy. This coupling can cause performance degradation in noisy non-stationary environments. Here, we introduce Repeated Update Q-learning (RUQL), a learning algorithm that resolves the undesirable artifact of Q-learning while maintaining simplicity. We analyze the similarities and differences between RUQL, QL, and the closest state-of-the-art algorithms theoretically. Our analysis shows that RUQL maintains the convergence guarantee of QL in stationary environments, while relaxing the coupling between the execution policy and the learning dynamics. Experimental results confirm the theoretical insights and show how RUQL outperforms both QL and the closest state-of-the-art algorithms in noisy non-stationary environments.
منابع مشابه
Addressing the policy-bias of q-learning by repeating updates
Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal policies in Markov decision processes. However, Q-learning shows artifacts in non-stationary environments, e.g., the probability of playing the optimal action may decrease if Q-values deviate significantly from the true values, a situation that may arise in the initial phase as well as after change...
متن کاملAdaptive Multiagent Q-Learning with Initial Heuristic Approximation
The problem of effective coordination learning of multiple autonomous agents in a multiagent system (MAS) is one of the most complex challenges in artificial intelligence because of two principal cumbers: non-stationarity of the environment and exponential growth of its dimensionality with number of agents. Non-stationarity of the environment is due to the dependence of the transition function ...
متن کاملMultiagent Learning
One of the greatest difficulties about multiagent learning is that the environment is not stationary with respect to the agent. In case of single agent learning problems, the agent has to maximize its expected reward with respect to an environment which is stationary. In case of multiagent scenarios, all the agents learning simultaneously poses a problem of non-stationarity in the environment w...
متن کاملAddressing Function Approximation Error in Actor-Critic Methods
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and critic. Our algorithm takes the minimum value between a pair of critics to restrict...
متن کاملIncremental Sensorimotor Learning with Constant Update Complexity
The field of robotics is increasingly moving toward applications that involve unstructured human environments. This domain is challenging from a learning perspective, since subsequent observations are dependent and the environment is typically non-stationary. This non-stationarity is not limited to the external environment, as internal sensorimotor relationships may be subject to change as well...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 17 شماره
صفحات -
تاریخ انتشار 2016